usALEX - Corrections - Direct excitation physical parameter

This notebook estimates direct excitation coefficient $d_T$ from μs-ALEX data.

Definitions Memo

$$n_d = I_{D_{ex}} \, \sigma_{D_{ex}}^D \, \phi_D \, \eta_{D_{det}}^{D_{em}} \, (1-E)$$$$n_a = I_{D_{ex}} \, \sigma_{D_{ex}}^D \, \phi_A \, \eta_{A_{det}}^{A_{em}} \, E$$$$ n_{aa} = I_{A_{ex}} \, \sigma_{A_{ex}}^A \, \phi_A \, \eta_{A_{det}}^{A_{em}}$$$$n_a^* = n_a + Lk + Dir$$

where

$$Lk = I_{D_{ex}} \, \sigma_{D_{ex}}^D \, \phi_D \, \eta_{A_{det}}^{D_{em}} \, (1-E)$$$$Dir = I_{D_{ex}} \, \sigma_{D_{ex}}^A \, \phi_A \, \eta_{A_{det}}^{A_{em}}$$$$\gamma = \frac{\phi_A\,\eta_{D_{det}}^{A_{em}}}{\phi_D\,\eta_{D_{det}}^{D_{em}}}$$

Aim of this notebook

What is already computed?

We previously fitted the leakage and gamma coefficient from the RAW PR values for the 5 dsDNA measurements. We also fitted the direct excitation coefficient expressed (dir_ex_aa) as a function of the A-signal during A-excitation (naa). In symbols, dir_ex_aa is defined as:

$$ d_{AA} = \frac{n_{a}}{n_{aa}}$$

for a A-only population.

What we want to compute?

Alternatively, we can express the direct excitation contribution ($Dir$) as a function of the total corrected burst size:

$$ Dir = d_T\, (n_a + \gamma n_d)$$

With this definition, expressing $d_T$ as a function of the physical parameters we obtain:

$$d_T = \frac{\sigma_{D_{ex}}^A}{\sigma_{D_{ex}}^D} $$

where $\sigma_{Dex}^A$ and $\sigma_{Dex}^D$ are the absorption cross-sections of the Acceptor and Donor dye at wavelength of Donor laser.

Finally, remembering the definition of $\beta$:

$$ \beta = \frac{I_{A_{ex}}\sigma_{A_{ex}}^A}{I_{D_{ex}}\sigma_{D_{ex}}^D}$$

we can express $d_T$ as the product of $\beta$ and $d_{AA}$:

$$ d_T = \beta \, d_{AA}$$

Note that $d_T$ is a property of the Donor-Acceptor dyes pair and of the Donor excitation wavelength. As such, differently from $d_AA$, the $d_T$ coefficient is valid for the same sample in any setup using the same donor excitation wavelength, such as the single-spot μs-ALEX and the multi-spot system. Additionally, $d_T$ allows to correct for direct acceptor excitation using only donor-excitation quantities. Therefore the same correction formula can be used both in two-laser (e.g. single-spot μs-ALEX) and single-laser systems (e.g. 8-spot system).

References:

  • Ingargiola, Antonino. “Applying Corrections in Single-Molecule FRET.” bioRxiv 083287 (2017). DOI:10.1101/083287.

How we compute it?

We use two different procedures both yielding an estimation of $d_T$. Except for the numerical accuracy the two procedures are equivalent.

Procedure 1: Using beta

From the previous relation between $d_T = \beta \,d_{AA}$ is possible to directly estimate $d_T$ with the values of $\beta$ and $d_{AA}$ we already fitted in previous notebooks.

Procedure 2: Correction formula

It is possible to go from the raw $E_R$ (only background correction, no leakage, direct excitation nor gamma) to the fully-corrected $E$ using the formula:

$$ E = f(E_R,\, \gamma,\, L_k,\, d_T) = \frac{E_{R} \left(L_{k} + d_T \gamma + 1\right) - L_{k} - d_T \gamma} {E_{R} \left(L_{k} - \gamma + 1\right) - L_{k} + \gamma}$$
  • See (Ingargiola “Applying Corrections in Single-Molecule FRET.” bioRxiv 083287 (2017). DOI:10.1101/083287) for derivation.

We can compute the corrected $E$ for the 5 dsDNA samples by fitting the fully-corrected histograms (histograms with γ, leakage and direct excitation corrections). We can also fit the 5 $E_R$ values for the same samples from the proximity ratio histograms (only background correction).

Therefore, using the previous formula we can fit $d_T$ (dir_ex_t) by minimizing the error between the 5 $E$ values fitted from corrected histograms and the 5 $E$ values obtained correcting the 5 $E_R$ values from the fit of the proximity ratio histograms.

Loading data

We load the needed libraries and FRETBursts which includes the FRET correction formulas ($E = f(E_R,\, \gamma,\, L_k,\, d_T)$).


In [ ]:
import os
import numpy as np
import pandas as pd
import lmfit
from fretbursts import *
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format='retina'  # for hi-dpi displays
sns.set_style('whitegrid')

Load Raw PR


In [ ]:
#bsearch_ph_sel = 'AND-gate'
bsearch_ph_sel = 'Dex'

data_file = 'results/usALEX-5samples-PR-raw-%s.csv' % bsearch_ph_sel

These are the RAW proximity ratios for the 5 samples (only background correction, no leakage nor direct excitation):


In [ ]:
data_raw = pd.read_csv(data_file).set_index('sample')
data_raw[['E_gauss_w', 'E_kde_w', 'E_gauss_w_err', 'E_gauss_w_fiterr', 'n_bursts_all', 'n_bursts_fret']]

Columns legend

  • E_gauss_w, center of Gaussian peak, fitted on the weighted histograms.
  • E_kde_w, maximum of the KDE computed on the weighted histograms.
  • E_gauss_w_err is $\frac{\sigma}{\sqrt{n_{FRET}}}$, where $\sigma$ is the std.dev. of the Gaussian peak and $n_{FRET}$ is the number of bursts in the FRET population.
  • E_gauss_w_fiterr is the standard error returned by the non-linear least square fit. It is computed from the Jacobian taking into account the degrees of freedom.
  • n_bursts_all, total number of bursts
  • n_bursts_fret, number of bursts in the FRET population

Load Corrected E

And these are the FRET efficiencies fitted from corrected histograms for the same 5 samples:


In [ ]:
data_file = 'results/usALEX-5samples-E-corrected-all-ph.csv'
data_corr = pd.read_csv(data_file).set_index('sample')
data_corr[['E_gauss_w', 'E_kde_w', 'E_gauss_w_err', 'E_gauss_w_fiterr', 'n_bursts_all', 'n_bursts_fret']]

Columns legend

  • E_gauss_w, center of Gaussian peak, fitted on the weighted histograms.
  • E_kde_w, maximum of the KDE computed on the weighted histograms.
  • E_gauss_w_err is $\frac{\sigma}{\sqrt{n_{FRET}}}$, where $\sigma$ is the std.dev. of the Gaussian peak and $n_{FRET}$ is the number of bursts in the FRET population.
  • E_gauss_w_fiterr is the standard error returned by the non-linear least square fit. It is computed from the Jacobian taking into account the degrees of freedom.
  • n_bursts_all, total number of bursts
  • n_bursts_fret, number of bursts in the FRET population

Load SNA Data


In [ ]:
raw_data_file_sna = 'results/alix/us-ALEX SNA Results 2016-10-12.csv'

In [ ]:
rsna = pd.read_csv(raw_data_file_sna, index_col=0)
rsna

In [ ]:
sna = rsna[['<E>', 'Mode']].round(4)
sna.columns = ['SNA_E_mean', 'SNA_E_max']
sna

In [ ]:
data_file_sna = 'results/usALEX-5samples-E-SNA.csv'

In [ ]:
sna.to_csv(data_file_sna)

Load μs-ALEX corrections


In [ ]:
leakage_coeff_fname = 'results/usALEX - leakage coefficient DexDem.csv'
leakage = np.loadtxt(leakage_coeff_fname)

print('Leakage coefficient:', leakage)

In [ ]:
dir_ex_coeff_fname = 'results/usALEX - direct excitation coefficient dir_ex_aa.csv'
dir_ex_aa = np.loadtxt(dir_ex_coeff_fname)

print('Dir. excitation AA:', dir_ex_aa)

In [ ]:
dir_ex_t_datasheet_fname = 'results/Dyes - ATT0647N-ATTO550 abs X-section ratio at 532nm.csv'
dir_ex_t_datasheet = np.loadtxt(dir_ex_t_datasheet_fname)

print('Direct excitation (dir_ex_t) from datasheet:', dir_ex_t_datasheet)

In [ ]:
gamma_coeff_fname = 'results/usALEX - gamma factor - all-ph.csv'
gamma = np.loadtxt(gamma_coeff_fname)

print('Gamma factor:', gamma)

In [ ]:
beta_coeff_fname = 'results/usALEX - beta factor - all-ph.csv'
beta = np.loadtxt(beta_coeff_fname)

print('Beta factor:', beta)

Procedure 1

Compute $d_T$ using $\beta$ and $d_{AA}$:


In [ ]:
dir_ex_t_beta = dir_ex_aa * beta
'%.5f' % dir_ex_t_beta

In [ ]:
with open('results/usALEX - direct excitation coefficient dir_ex_t beta.csv', 'w') as f:
    f.write('%.5f' % dir_ex_t_beta)

With this coefficient, computing the corrected $E$ for the 5 dsDNA samples we obtain:


In [ ]:
PR_corr_kde = fretmath.correct_E_gamma_leak_dir(data_raw.E_kde_w, 
                                                leakage=leakage, 
                                                dir_ex_t=dir_ex_t_beta,
                                                gamma=gamma)*100
PR_corr_kde

In [ ]:
PR_corr_gauss = fretmath.correct_E_gamma_leak_dir(data_raw.E_gauss_w, 
                                                  leakage=leakage, 
                                                  dir_ex_t=dir_ex_t_beta,
                                                  gamma=gamma)*100
PR_corr_gauss

Procedure 2

Datasheet-based direct excitation

The coefficient $d_T$ can be estimated from data-sheet values of $\sigma_{D_{ex}}^A$ and $\sigma_{D_{ex}}^D$.

Using the datasheet values provided by ATTOTec (in PBS buffer) we obtain a $d_T$ estimation close to 10%:


In [ ]:
dir_ex_t_datasheet

With this the corrected $E$ for the 5 dsDNA samples are:


In [ ]:
E_datasheet = fretmath.correct_E_gamma_leak_dir(data_raw.E_kde_w, 
                                                leakage=leakage, 
                                                dir_ex_t=dir_ex_t_datasheet,
                                                gamma=gamma)*100
E_datasheet

Comparing these values with the ones obtained fitting the corrected E histograms we observe a significant discrepancy:


In [ ]:
out = data_corr[['E_kde_w']].copy()*100
out.columns = ['E_alex']
out['E_datasheet'] = E_datasheet
out

In [ ]:
out.plot(alpha=0.4, lw=3, style=dict(E_alex='-ob', E_datasheet='-sr'));

NOTE: The corrected FRET efficiencies using the datasheet and μs-ALEX-based direct excitation do not match well.

Fitting direct excitation $d_T$


In [ ]:
def residuals_absolute(params, E_raw, E_ref):
    dir_ex_t = params['dir_ex_t'].value
    return E_ref - fretmath.correct_E_gamma_leak_dir(E_raw, 
                                                     leakage=leakage, 
                                                     gamma=gamma, 
                                                     dir_ex_t=dir_ex_t)

In [ ]:
def residuals_relative(params, E_raw, E_ref):
    dir_ex_t = params['dir_ex_t'].value
    return (E_ref - fretmath.correct_E_gamma_leak_dir(E_raw, 
                                                      leakage=leakage, 
                                                      gamma=gamma, 
                                                      dir_ex_t=dir_ex_t))/E_ref

In [ ]:
params = lmfit.Parameters()
params.add('dir_ex_t', value=0.05)

In [ ]:
m = lmfit.minimize(residuals_absolute, params, args=(data_raw.E_kde_w, data_corr.E_kde_w))
lmfit.report_fit(m.params, show_correl=False)

In [ ]:
m = lmfit.minimize(residuals_relative, params, args=(data_raw.E_kde_w, data_corr.E_kde_w))
lmfit.report_fit(m.params, show_correl=False)

NOTE: The fitted dir_ex_t is 4.5% as opposed to 10.6% as expected from the absorption spectra of ATTO550 and ATTO647 at 532nm.


In [ ]:
'%.5f' % m.params['dir_ex_t'].value

In [ ]:
with open('results/usALEX - direct excitation coefficient dir_ex_t fit.csv', 'w') as f:
    f.write('%.5f' % m.params['dir_ex_t'].value)

In [ ]:
PR_corr_kde_dfit = fretmath.correct_E_gamma_leak_dir(data_raw.E_kde_w, 
                                                     leakage=leakage, 
                                                     dir_ex_t=m.params['dir_ex_t'].value,
                                                     gamma=gamma)*100
PR_corr_kde_dfit.name = 'PR_corr_kde_dfit'
PR_corr_kde_dfit

Corrected E


In [ ]:
E = pd.concat([data_corr[['E_kde_w', 'E_gauss_w']]*100, PR_corr_kde, PR_corr_gauss, sna*100], axis=1)
E.columns = ['E KDE', 'E Gauss', 'PR KDE', 'PR Gauss', 'SNA E mean', 'SNA E max']
E

In [ ]:
E.plot.bar(table=np.round(E, 2).T)
plt.ylabel('FRET (%)')
plt.gca().xaxis.set_visible(False)
#plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);

Estimated errors


In [ ]:
Eerr = (pd.DataFrame().assign(
            E_gauss_w_err = data_corr.E_gauss_w_err, E_gauss_w_fiterr = data_corr.E_gauss_w_fiterr,
            PR_gauss_w_err = data_raw.E_gauss_w_err, PR_gauss_w_fiterr = data_raw.E_gauss_w_fiterr,
            SNA_SDV=rsna['SDV(E)']))*100
Eerr

In [ ]:
nbursts = data_corr[['n_bursts_all', 'n_bursts_fret']]
nbursts

Note on standard errors

The standard error for the Gausian peak fit is reported both as the error returned from the fit routine (*gauss_w_fiterr) and as $\frac{\sigma}{\sqrt{n_{FRET}}}$ (*gauss_w_err, see Columns legend above for more details). The columns starting with E_ are from fit of the corrected FRET histograms. The columns starting with PR_ are from fit of the PR histograms. In principle, the PR errors should be propagated through the correction formula, but this prpagation is not reported here (error is small).

The KDE method does not naturally provide a standard error. In principle the error can be computed using the bootstrap method, which is computationally intensive and is not reported here.

The SNA method returns a distribution of FRET efficiencies that, when shot-noise is added, best fits the experimental histogram. We report estimates of E using the max or the mean of the E distribution. We report the error as the standard deviation of the E distribution. Therefore this value is common to both E estimates (max or mean).

Comparison of different methods

A more empirical way to compute the error is comparing results of the different methods, as done below.


In [ ]:
E[['PR KDE', 'PR Gauss', 'E KDE']].plot(kind='bar')
E[['PR KDE', 'PR Gauss', 'E KDE']].plot(lw=3);
print('Max error E_alex vs E_corr_pr: %.2f' % (E['E KDE'] - E['PR KDE']).abs().max())
print('Max error E_alex vs E_beta:    %.2f' % (E['E KDE'] - E['PR Gauss']).abs().max())
print('Max error E_beta vs E_corr_pr: %.2f' % (E['PR Gauss'] - E['PR KDE']).abs().max())

In [ ]:
x = [int(idx[:-1]) for idx in out.index]
plt.plot(x, 'E KDE', data=E)
plt.plot(x, 'PR KDE', data=E)
plt.plot(x, 'PR Gauss', data=E)
plt.xlabel('Distance in base-pair')
plt.ylabel('FRET');

In [ ]:
E['E KDE'] - E['PR KDE']

NOTE: Fitting $d_T$ to match $E$ from corrected histograms with $E$ from PR correction formula produces a max difference of 1% for the 12d sample. The match is well below the fitting accuracy (> 2%).

Save


In [ ]:
E.to_csv('results/usALEX-5samples-E-all-methods.csv', float_format='%.3f')

In [ ]:
E.round(3)

In [ ]:
Eerr.to_csv('results/usALEX-5samples-E-all-methods_errors.csv', float_format='%.4f')